condorcet winner
The Sampling Complexity of Condorcet Winner Identification in Dueling Bandits
Saad, El Mehdi, Thuot, Victor, Verzelen, Nicolas
We study best-arm identification in stochastic dueling bandits under the sole assumption that a Condorcet winner exists, i.e., an arm that wins each noisy pairwise comparison with probability at least $1/2$. We introduce a new identification procedure that exploits the full gap matrix $Δ_{i,j}=q_{i,j}-\tfrac12$ (where $q_{i,j}$ is the probability that arm $i$ beats arm $j$), rather than only the gaps between the Condorcet winner and the other arms. We derive high-probability, instance-dependent sample-complexity guarantees that (up to logarithmic factors) improve the best known ones by leveraging informative comparisons beyond those involving the winner. We complement these results with new lower bounds which, to our knowledge, are the first for Condorcet-winner identification in stochastic dueling bandits. Our lower-bound analysis isolates the intrinsic cost of locating informative entries in the gap matrix and estimating them to the required confidence, establishing the optimality of our non-asymptotic bounds. Overall, our results reveal new regimes and trade-offs in the sample complexity that are not captured by asymptotic analyses based only on the expected budget.
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- North America > United States > New York (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > New York (0.04)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- North America > United States (0.04)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Michigan (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (6 more...)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States (0.04)
- Europe > Germany (0.05)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > New York > Rensselaer County > Troy (0.04)
- North America > United States > California > Los Angeles County > Pasadena (0.04)
Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions
Recent work on deriving $O(\log T)$ anytime regret bounds for stochastic dueling bandit problems has considered mostly Condorcet winners, which do not always exist, and more recently, winners defined by the Copeland set, which do always exist. In this work, we consider a broad notion of winners defined by tournament solutions in social choice theory, which include the Copeland set as a special case but also include several other notions of winners such as the top cycle, uncovered set, and Banks set, and which, like the Copeland set, always exist. We develop a family of UCB-style dueling bandit algorithms for such general tournament solutions, and show $O(\log T)$ anytime regret bounds for them. Experiments confirm the ability of our algorithms to achieve low regret relative to the target winning set of interest.
- Asia > India > Karnataka > Bengaluru (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Leisure & Entertainment > Sports > Tennis (0.47)
- Government > Voting & Elections (0.44)
- Information Technology > Artificial Intelligence > Machine Learning (0.93)
- Information Technology > Data Science > Data Mining > Big Data (0.31)